Weakly Supervised Representation Learning for Audio-Visual Scene Analysis

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weakly Supervised PatchNets: Learning Aggregated Patch Descriptors for Scene Recognition

In this paper, we propose a hybrid representation, which leverages the great discriminative capacity of CNNs and the efficiency of descriptor encoding scheme scene recognition. We make three main contributions. First, we train an end-to-end PatchNet in a weakly supervised manner, in order to extract the discriminative deep descriptors of local patches. Second, we design a novel VSAD encoding ap...

متن کامل

Scene Parsing by Weakly Supervised Learning with Image Descriptions

This paper investigates a fundamental problem of scene understanding: how to parse a scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations). We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixel-wise object labeling and ii) a recursive neural netw...

متن کامل

Weakly-Supervised Video Scene Co-parsing

In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only videolevel category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking proce...

متن کامل

Weakly-supervised Dictionary Learning

We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly l...

متن کامل

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) fram...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM Transactions on Audio, Speech, and Language Processing

سال: 2020

ISSN: 2329-9290,2329-9304

DOI: 10.1109/taslp.2019.2957889